Doctoral Dissertation Rapid Unsupervised Speaker Adaptation Based on Sufficient Statistics of Hidden Markov Models

نویسنده

  • Randy Gomez
چکیده

In realizing a speech recognition system robust to variation of speakers, an efficient adaptation algorithm is needed. Most adaptation techniques require many adaptation data to carry out an adaptation task. Adaptation data are often collected from the actual speaker itself in several utterances. With the time needed to gather and transcribe the adaptation utterances, together with the actual execution time of the adaptation algorithm, real-time speech recognition is difficult to realize. We propose a novel approach in solving the problem that hinders practical implementation of speaker adaptation by using only a single untranscribed utterance from the user. This unsupervised speaker adaptation approach can execute in few seconds with a significant improvement in recognition performance as compared to data-greedy and time-exhausting adaptation schemes. This thesis, details the science behind the development and implementation of the rapid unsupervised speaker adaptation based on Hidden Markov Models-Sufficient Statistics (HMMSufficient Statistics). In this approach, we process in advance the training database into HMMSufficient Statistics offline. During the actual adaptation (online), the process starts with the N-best speaker selection which is acoustically close to the user’s utterance. The HMM-Sufficient Statistics of the N-best speakers are selected ∗Doctoral Dissertation, Department of Information Processing, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-DD0361218, October 1, 2006. i as adaptation data. In view of the fact that HMM-Sufficient Statistics are precomputed offline, considerable amount of computation time needed for processing is saved and re-allocated efficiently to using good-performance but computationally expensive adaptation platforms. The end result, a rapid adaptation system with good recognition performance. Experiments using Vocal Tract Length Normalization (VTLN), Maximum A Posteriori (MAP) and Maximum Likelihood Linear Regression (MLLR) were performed. Moreover we tested for robustness under noisy environment conditions such as office, car, crowd and booth noise in several signal-to-noise ratios (SNRs). In this thesis we succesfully designed a rapid unsupervised speaker adaptation that requires only a single arbitrary utterance without transcriptions and execute in 7sec of adaptation time. The proposed method is suitable for speech recognition applications where adaptation data is scarce and execution time is critical. Furthermore, we have fully integrated the proposed approach in a real application using a dialogue system where the adaptation technique is integrated and interacts freely with the recognizer and several processes in the system in a real environment condition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rapid unsupervised speaker adaptation based on multi-template HMM sufficient statistics in noisy environments

This paper describes a multi-template unsupervised speaker adaptation based on HMM-Sufficient Statistics. Multiple class-dependent models based on gender and age are used to push up the adaptation performance while keeping adaptation time within few seconds with just one arbitrary utterance. Adaptation begins with the estimation of speaker‘s class from the N-best neighbor speakers using Gaussia...

متن کامل

Spectral subtraction in noisy environments applied to speaker adaptation based on HMM sufficient statistics

Noise and speaker adaptation techniques are essential to realize robust speech recognition in real noisy environments . In this paper, we applied spectral subtraction to an unsupervised speaker adaptation algorithm in noisy environments. The adaptation algorithm consists of the following five steps. (1) Spectral subtraction is carried out for noise added database. (2) Noise matched acoustic mod...

متن کامل

Unsupervised speaker adaptation based on sufficient HMM statistics of selected speakers

This paper describes an efficient method for unsupervised speaker adaptation. This method is based on (1) selecting a subset of speakers who are acoustically close to a test speaker, and (2) calculating adapted model parameters according to the previously stored sufficient HMM statistics of the selected speakers’ data. In this method, only a few unsupervised test speaker’s data are required for...

متن کامل

Evaluation on unsupervised speaker adaptation based on sufficient HMM statictics of selected speakers

This paper describes an efficient method of unsupervised speaker adaptation. This method is based on (1) selecting a subset of speakers who are acoustically close to a test speaker, and (2) calculating adapted model parameters according to the previously stored sufficient statistics of the selected speakers’ data. In this method, only a few unsupervised test speaker’s data are necessary for the...

متن کامل

On-line hierarchical transformation of hidden Markov models for speaker adaptation

This paper presents a novel framework of on-line hierarchical transformation of hidden Markov models (HMM’s) for speaker adaptation. Our aim is to incrementally transform (or adapt) all the HMM parameters to a new speaker even though part of HMM units are unseen in adaptation data. The transformation paradigm is formulated according to the approximate Bayesian estimate, which the prior statisti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006